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Abstract 

Background: Although technical advances in genomics and proteomics research have yielded a better 
understanding of the coding capacity of a genome, one major challenge remaining is the identification of all 
expressed proteins, especially those less than 100 amino acids in length. Such information can be particularly 
relevant to human pathogens, such as Trypanosoma brucei, the causative agent of African trypanosomiasis, since it 
will provide further insight into the parasite biology and life cycle. 

Results: Starting with 993 T. brucei transcripts, previously shown by RNA-Sequencing not to coincide with 
annotated coding sequences (CDS), homology searches revealed that 173 predicted short open reading frames in 
these transcripts are conserved across kinetoplastids with 13 also conserved in representative eukaryotes. Mining 
mass spectrometry data sets revealed 42 transcripts encoding at least one matching peptide. RNAi-induced 
down-regulation of these 42 transcripts revealed seven to be essential in insect-form trypanosomes with two also 
required for the bloodstream life cycle stage. To validate the specificity of the RNAi results, each lethal phenotype 
was rescued by co-expressing an RNAi-resistant construct of each corresponding CDS. These previously 
non-annotated essential small proteins localized to a variety of cell compartments, including the cell surface, 
mitochondria, nucleus and cytoplasm, inferring the diverse biological roles they are likely to play in T. brucei. We 
also provide evidence that one of these small proteins is required for replicating the kinetoplast (mitochondrial) DNA. 

Conclusions: Our studies highlight the presence and significance of small proteins in a protist and expose potential 
new targets to block the survival of trypanosomes in the insect vector and/or the mammalian host. 
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Background 

Recent advances in high-throughput sequencing tech- 
nologies have led to the discovery of a large number of 
transcripts originating from regions of the genome pre- 
viously thought to be silent [1]. One major challenge 
arising from these observations is to determine whether 
these transcripts code for a protein or should be classi- 
fied as non-coding RNAs. This task is rather overwhelm- 
ing, since a majority of these transcripts only have the 
potential to encode small proteins, generally less than 
100 amino acids (aa) [2,3]. Historically, an arbitrary cut- 
off for open reading frames of 100 aa was applied in 
genome annotation projects [4,5] and thus the extent 
and functional significance of small open reading frames 
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(sORFs) remains a largely unexplored territory in many 
organisms. Nevertheless, copious reports clearly indi- 
cate that they play crucial biological roles, including 
protection against pathogens [6,7], signal transduction 
[8], serving as molecular chaperones [9], developmental 
regulation [10-13] and even calcium transport in car- 
diac muscle contraction [14], 

Several proteins encoded by sORFs have been identified 
serendipitously by biochemical methods as part of a com- 
plex or the product of a processed precursor protein. One 
example is the Drosophila tarsal-less (tat) gene, originally 
annotated as non-coding, but later shown to encode three 
small proteins with a crucial role in fly development [13]. 
Several studies have used genome-wide approaches to 
gauge the prevalence of sORFs. When examining potential 
small proteins in Drosophila melanogaster, Ladoukakis et al. 
identified 4,561 sORFs that were conserved in a closely re- 
lated species, Drosophila pseudoobscura [15]. Synteny, 



© 2014 Ericson et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative 
Commons Attribution License (http://creativecommons.Org/licenses/by/2.0), which permits unrestricted use, distribution, and 
reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain 
Dedication waiver (http://creativecommons.Org/publicdomain/zero/1.0/) applies to the data made available in this article, 
unless otherwise stated. 



Ericson et al. BMC Biology 2014, 12:14 
http://www.biomedcentral.eom/1 741 -7007/1 2/1 4 



Page 2 of 1 7 



evidence of transcription and nucleotide substitution, nar- 
rowed the 4,561 to a more conservative estimate of 401 
sORFs. A study on the Arabidopsis small proteome 
assessed evolutionary conservation and examined evi- 
dence of transcription to predict the expression of as 
many as 3,241 sORFs [16]. A report on the mammalian 
small proteome by Frith et al used FANTOM cDNA data 
to identify a potential 1,240 sORFs using a CRITICA 
gene-detection program [17]. Additionally, 25 sORFs were 
GFP-tagged and, following transfection into cells, 14 of 
the fusion proteins were detected, providing evidence of 
translation [17]. More recently, using a novel combination 
of peptidomics and RNA-Sequencing (RNA-Seq), Slavoff 
et al identified 86 novel small proteins in humans and 
two were tagged and shown to localize to the mitochon- 
dria and cytoplasm [18]. Nevertheless, to date few func- 
tional studies of proteins encoded by sORFs have been 
performed. In yeast, 140 small proteins were tested by 
generating gene deletions and 22 had an effect on Saccha- 
romyces cerevisiae growth under various conditions [19], 
whereas overexpression of 473 small proteins in Arabidop- 
sis resulted in 49 recognizable pheno types [20] . 

Mass spectrometry, a powerful technique in proteo- 
mics to validate the existence of putative protein candi- 
dates, has been applied in several studies [18,21-25]. 
High-resolution mass spectrometry provides very accur- 
ate precursor ion masses and combined with stringent 
statistical methods enhances the certainty of peptide 
identification [26]. This is a key issue in the validation of 
newly identified sORFs. In general, a protein database 
derived from the genome is used in shotgun proteomics 
to identify peptides and proteins from mass spectromet- 
ric raw data, but six frame translation of the genome is 
also frequently employed [24,25]. In either case, the cer- 
tainty of the existence of any protein can be increased 
by an observed corresponding RNA transcript. Recently, 
we used a combination of stringent methods, that is, 
ribosome footprinting, next generation sequencing and 
advanced mass spectrometric technology, to discover a 
plethora of novel sORFs in cytomegalovirus, many of 
which we determined to exist at the protein level [23]. 

The question of whether functional small proteins 
exist is particularly relevant in organisms with a tightly 
organized genome, such as the parasitic protozoan Try- 
panosoma brucei. Protein-coding genes are arranged in 
long unidirectional clusters with intergenic regions only 
a few hundred nucleotides in length, thus leaving little 
space for sORFs or non-coding RNAs. The initial se- 
quencing and annotation of the 11 megabase-sized chro- 
mosomes, published in 2005, predicted 9,068 protein- 
coding genes [27]. As of November 2013, this number 
has increased to 10,574 (TriTrypDB); however, a major 
challenge remains to identify all expressed proteins. This 
quest was addressed by several RNA-Seq studies using 



Illumina high-throughput cDNA sequencing [28-31]. In 
particular, we provided evidence that the coding poten- 
tial of the T. brucei genome was larger than originally 
anticipated by identifying 1,114 transcripts mapping to 
regions of the genome with no annotated ORFs [28]. A 
total of 993 of these transcripts have the potential to 
contain a coding sequence (CDS) of at least 25 amino 
acids and the remaining 121 transcripts either have no 
coding potential at all or no ORF larger than 75 nucleo- 
tides. However, it remains to be established whether 
these transcripts encode functional proteins. 

Founded on the set of transcripts identified by our tran- 
scriptome analysis [28], we applied bioinformatics ap- 
proaches to identify small proteins conserved across 
kinetoplastid species and representative eukaryotes. Com- 
bined with mass spectrometry data, we pinpointed 42 
high-confidence small proteins ranging in size from 49 to 
219 amino acids. RNAi-knockdown revealed seven essen- 
tial proteins in the insect-stage of the life cycle and their 
diverse subcellular localizations suggested involvement in 
many aspects of T. brucei biology. 

Results 

7". brucei transcripts encoding evolutionarily conserved 
potential small proteins 

We previously published a single-nucleotide resolution 
genomic map of the T. brucei transcriptome, which in- 
cluded 1,114 transcripts not originating from annotated 
CDS ([28]; original RNA-Seq data have been submitted to 
the National Center for Biotechnology Information (NCBI) 
Sequence Read Archive - SRA at [32] - under accession no. 
SRA012290 and the 1,114 transcripts are accessible through 
a community file, Tbrucei_novel_transcripts.fasta, on Tri- 
TrypDB at [33]). After a reexamination of this data set 
using the latest T. brucei genome annotation (GeneDB ver- 
sion 5, [34]), we excluded 39 and 10 transcripts coding for 
snoRNAs and annotated proteins larger than 300 amino 
acids, respectively, and added two novel transcripts coding 
for proteins identified by mass spectrometry (MS) data 
(Figure 1). Setting a lower limit of 25 aa, 987 of the 
remaining transcripts contain between one (112 transcripts) 
and 31 (1 transcript) ORFs for a total of 4,699 ORFs [see 
Additional file 1]. Eighty transcripts were classified as non- 
coding RNAs, since the predicted ORFs were less than 75 
nucleotides. However, we cannot exclude the possibility 
that the latter category has coding potential by using alter- 
native initiation codons or encoding proteins smaller than 
25 aa. 

The selected 4,699 ORFs were highly enriched in short 
ORFs (sORFs), that is, less than 100 amino acids, with 
4,499 ORFs (96%) falling into this category [see Additional 
file 2: Figure SI]. Since proteins encoded by sORFs largely 
escape standard genome annotations, we examined evolu- 
tionary conservation in combination with computational 
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Add 2 transcripts with 
matching MS data 



1,114 T. brucei transcripts 

not coinciding with 
annotated CDS identified by 
Kolev et al. [22] 









1 ,067 transcripts 



Remove 39 transcripts 
encoding snoRNAs and 
10 encoding proteins 
longer than 300 aa 



987 transcripts could encode 
4,699 ORFs (25-300 aa) 



Remove 80 non-coding 
transcripts (<25 aa) 



58 predicted 
small proteins 
supported by 
MS data 




173 ORFs conserved across 
kinetoplastids, including 
13 ORFs conserved across 
eukaryotes 





Add 5 T. brucei- 
specific proteins with 
matching MS data 



63 small proteins 



Remove 21 proteins (predicted 
ribosomal or annotated proteins, 
or multiple copies) 



42 small proteins 



Figure 1 Flowchart of the strategy used to analyze T. brucei transcripts not coinciding with annotated coding sequences (CDS). 



approaches to screen for ORFs conserved in kinetoplasti- 
dae and representative eukaryotes as a benchmark for pro- 
tein expression. Kinetoplastid protists belong to the 
phylum Euglenozoa and include a significant number of 
disease-causing parasites, such as T. brucei and T. cruzi, 
the causative agent of African trypanosomiasis and Chagas 
disease, respectively, and the Old and New World Leish- 
mania parasites, which cause various forms of leishmania- 
sis worldwide. First, we conducted Basic Local Alignment 
Search Tool (BLAST) analyses [35] of kinetoplastid ge- 
nomes and annotated proteins, excluding the T. brucei 
subspecies (see Methods for details). Of the 987 tran- 
scripts, 157 encoded one ORF that was conserved in at 
least one kinetoplastid organism and four transcripts (Tb4. 
NT.S1, Tb5.NT.84, Tb6.NT.58 and Tb8.NT.142) encoded 
between two and twelve conserved ORFs for a total of 173 
conserved ORFs [see Additional file 3]. Second, we com- 
pared the selected 4,699 ORFs to the annotated proteins 
from representative eukaryotes, namely S. cerevisiae, Cae- 
norhabditis elegans, Arabidopsis thaliana, D. melanoga- 
ster, Mus musculus and Homo sapiens. We found that 13 
ORFs had significant alignments with BLAST bit scores 
ranging from 34 to 227, with 6 coding for ribosomal pro- 
teins [see Additional file 4]. It is worth noting that these 
13 ORFs were part of the set conserved in kinetoplastids. 
We next surveyed the 173 conserved ORFs for known 
protein domain(s) using the CD-Search Tool (cdsearch/ 
edd v3.10 [36]) and detected domains in 61 ORFs covering 
a broad spectrum [see Additional file 5]. However, the 
ribosomal protein superfamily (six hits), various Zn finger 



domains (five hits) and the RNA recognition motif (RRM) 
superfamily (three hits) were overrepresented. Finally, our 
analysis of SignalP [37] and TMHMM [38] predictions re- 
vealed that 5 of the 173 potential small proteins have a 
predicted signal peptide and that a considerable number 
(43 or 25%) have a predicted trans-membrane domain 
with seven having more than one predicted domain. 

Identification of small protein candidates through mass 
spectrometry data 

For an alternative approach based on peptide evidence 
to recognize transcripts coding for small proteins, we 
surveyed MS data ([21,22] and this study) for peptides 
matching the 4,699 selected ORFs described above. As 
reported previously [28], searching the proteome data of 
Panigrahi et al. [21] provided evidence for the expres- 
sion of 16 small proteins, with all 16 being part of the 
173 small protein candidates identified bioinformatically. 
In addition, MS data from Butter et al. [22] and this 
study revealed 63 hits. As well as providing validation 
for 58 of the 173 small protein candidates, our data also 
predicted five small proteins specific to T. brucei with 
no recognizable homologues in other kinetoplastids. We 
also performed a search against hexatranslations of the 
trypanosome genome, which revealed the same set of 
newly identified proteins (data not shown). Taken to- 
gether, we were able to provide supporting MS data for 
63 predicted small proteins with 22 being represented in 
more than one MS data set, and, except for the T. brucei- 
specific hits, all the other matches were among the 
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evolutionarily conserved 173 small protein candidates 
[see Additional file 3]. 

The 63 small protein candidates with supporting MS 
data were filtered further by removing predicted ribosomal 
proteins or annotated proteins with a predicted function 
and CDS with multiple copies in the genome, leaving us 
with 42 small proteins for further analysis [see Additional 
file 3]. This final group of selected proteins ranges in size 
from 49 to 219 amino acids and 35 qualify as small pro- 
teins. Transcript lengths vary from 333 to 4,100 nucleo- 
tides and the average 5' UTR length is 119 nucleotides 
with a median of 110 nucleotides. This is similar to the 
global analysis of the transcriptome [28,30], where a me- 
dian length of 128 to 130 nucleotides was reported. On 
the other hand, the 3' UTR is on average 390 nucleotides 
long with a median of 237 nucleotides, with the latter be- 
ing notably smaller than the medians reported in the 
aforementioned transcriptome studies, namely 400 nucle- 
otides [30] and 388 nucleotides [28]. 

Noteworthy characteristics of this collection of 42 
small proteins are as follows: three have putative homo- 
logues in representative eukaryotes (7#7.NT.49, Tbll. 
NT.47 and IM1.NT.220); predicted domains include 
two RRMs and two Zn-finger domains; sixteen have a 
predicted trans-membrane domain; and one has a pre- 
dicted signal peptide [see Additional file 3]. 

RNAi screen of the 42 small proteins revealed 7 to be 
essential in the insect life-cycle stage 

RNAi knock-down strategies have revolutionized the 
functional analysis of genes in T. brucei [39]. mRNA 
degradation is triggered most efficiently by double- 
stranded RNA (dsRNA) produced in vivo as a hairpin 
RNA transcribed from a tetracycline-inducible promoter. 
Thus, we generated a hairpin construct for each of the 
42 ORFs using the pTrypRNAiGate plasmid [40]. Each 
construct was stably integrated in the non-transcribed 
rRNA spacer region of a special procyclic-form recipient 
strain, named 29.13.6, expressing the tet repressor and 
T7 RNA polymerase [41], and clonal cell lines were 
established. Upon RNAi induction with tetracycline, 12 
had a growth phenotype that differed from un-induced 
control cells. Three of the knockdowns resulted in a 
slow-growth phenotype. For example, RNAi of Tb5. 
NT.58 resulted in a cell division time of 16 hours as 
compared to 8.5 hours for un-induced cells and this 
phenotype was not accompanied by noticeable changes 
in cell morphology [see Additional file 2: Figure S2]. 
In addition, knockdown of two small proteins (Tbll. 
NT.222 and 7M1.NT.66) resulted in faster growth with 
no obvious changes in cell morphology [see Additional 
file 2: Figure S2]. Monitoring cell growth after RNAi in- 
duction of the remaining seven revealed that all cell lines 
stopped dividing and eventually died demonstrating that 



Tb3.NT.18, TblO.NT.86, TblO.NT.87, TblO.NT.90, Tbll. 
NT. 28, Tbll.NT.29 and Tbll. NT. 108 are essential genes 
(Table 1; Additional file 2: Figure S3). For subsequent 
analyses we focused on the seven essential predicted 
small proteins and the RNAi knockdowns revealing a 
change in the doubling time were not pursued further. 

To confirm that the observed essentiality of the seven 
small proteins was specific to RNAi knockdown of the 
predicted transcript, we performed the following experi- 
ments. First, we verified the transcript length expected by 
the RNA-Seq data [28] using Northern blot analysis. In six 
of the seven cases a single predominant hybridizing band 
was detected and the observed size matched the predicted 
size within the limits of resolution of Northern blotting 
[see Additional file 2: Figure S4]. The seventh transcript, 
Tb3.NT.18, had two bands detected by Northern blot. 
One band corresponded to the size of the predicted novel 
transcript of 709 nucleotides. Further interrogation by 
Northern blotting and RT-PCR with probes specific for 
the upstream (IM27.3.1080) and downstream annotated 
gene (7&927.3.1090) led us to conclude that the longer 
RNA contained both Tb3.NT.18 and the downstream 
transcript encoding a component of the T. brucei Ul 
small nuclear ribonucleoprotein (snRNP). This finding was 
reminiscent of the presence of an upstream open reading 
frame (uORF) described in organisms from fungi to 
humans [42,43]. uORFs are defined as predominantly 
short ORFs found in the S UTR of a previously annotated 
gene and experiments are ongoing to investigate whether 
7&3.NT.18 qualifies as an uORF. 

Second, semi-quantitative RT-PCR verified that the 
knockdown of the seven essential transcripts was effi- 
cient [see Additional file 2: Figure S5]. Third, to confirm 
the specificity of the RNAi knockdown, we set out to 
rescue each lethal phenotype with the expression of an 
RNAi-resistant construct. To do this, the CDS targeted 
by RNAi was assembled as a synthetic sequence bearing 
at least one silent mutation per 12 contiguous base-pairs 
[44] and flanked by heterologous UTR sequences (see 
Methods). In addition, an HA-TEV-FLAG epitope tag or 
a GFP tag was added to the C-terminus [see Additional 
file 2: Figure S6]. Upon co-expression of the hairpin tar- 
geting the endogenous transcript and the corresponding 
modified CDS in a stable cell line, the endogenous tran- 
script was destroyed, as shown by RT-PCR, and in all 
seven cases the cells survived on the RNAi-resistant 
transcript encoding the same small protein [see Additional 
file 2: Figure S3]. These results led us to conclude that the 
essential phenotype was a direct consequence of the 
knockdown of the targeted CDS. 

Initial characterization of the essential small proteins 

The RNAi rescue experiments described above established 
that the epitope-tagged small proteins were functional 
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Table 1 Characteristics of the seven essential proteins in procyclics 



ORF ID 


Tran^rrint 

1 1 CI 1 1 k/ 1 

size (nt) 


AA/MW 

#%#%/ IVI vv 




PrpHirtpH 

signal peptide 


PrpHirtpH 

TM domain 


1 nrali7atinn 


Tb10.NT.87 


478 


64/8.2 


11.46 


No 


No 


Mitochondria 


Tb11.NT.28 


1,285 


56/6.3 


9.3 


No 


Yes 


Mitochondria 


Tb11.NT.29 


796 


62/7.6 


6.96 


No 


Yes 


Surface 


Tb3.NT.18 


709 


85/9.3 


7.91 


No 


Yes 


Cytoplasm 


Tb10.NT.86 


605 


93/10.5 


9.67 


No 


Yes 


Cytoplasm 


Tbl 1.NT.108 


461 


96/10.5 


6.49 


No 


No 


Cytoplasm 


Tb10.NT.90 


529 


67/7.3 


4.49 


No 


Yes 


Nucleus 



AA, number of amino acids; MW, molecular weight; ORF, open reading frame; pi, isoelectric point; TM, transmembrane. 



and, thus, could be used for biochemical and cell bio- 
logical experiments. Using fluorescence microscopy we 
detected expression of all seven small proteins (Figure 2) 
and Western blot analysis confirmed that the proteins 
had the predicted relative molecular mass [see Additional 
file 2: Figure S7]. rail.NT.28 and raiO.NT.87 revealed 
a fluorescence pattern typical of the procyclic trypano- 
some branched tubular mitochondrion (Figure 2). rail. 
NT.29 appeared to be a surface protein, an observation 
supported by subsequent experiments (see below). By 



immunofluorescence, three proteins (ra3.NT.18, raiO. 
NT. 86 and rail.NT.108) were shown to be enriched in 
the cytoplasm, with rail. NT. 108 distributed through- 
out this compartment, whereas ra3.NT.18 and TblO. 
NT. 86 appeared somewhat concentrated around the 
nucleus. Finally, raiO.NT.90 had a distinct localization 
in the nucleus, possibly indicative of the nucleolus. 

Since T. brucei undergoes extensive morphological and 
metabolic changes during its life cycle alternating between 
the mammalian (bloodstream) and insect (procyclic) hosts, 



Small protein 



Hoechst 



DIC 



















J 


•c 

























7d10.NT.87 



7d11.NT.28 



Td11.NT.29 



7d3.NT.18 



7d10.NT.86 



7d11.NT.108 



7d10.NT.90 



Figure 2 Localization of the seven small proteins essential in the procyclic life cycle stage. Anti-HA or anti-GFP antibodies were used to 
detect C-terminal HA- (Tbll.NT.28 and Tbll.NT.29) or GFP- (TblO.NT.87, TblO.NT.90, TblO.NT.86, Tb3.NT.l8, and Tbll.NT.l08) tagged versions of the 
proteins by fluorescence microscopy. DNA was stained with Hoechst (blue) and DIC images are shown in the right panels. DIC, differential 
interference contrast. 
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it was of interest to gauge the essentiality of the seven 
small proteins in bloodstream forms. Thus, the hairpin 
constructs were transfected into a bloodstream form cell 
line competent for RNAi and, following induction, Tblh 
NT.29, a potential surface protein, and TblO.NT.87, a 
probable mitochondrial protein, were shown to be essen- 
tial in this stage of the life cycle (Figure 3 A and B). Growth 
curves for the five nonessential proteins can be found in 
Figure 3C and Additional file 2: Figure S8. Based on the 
above results, we selected J%11.NT.29, rMl.NT.28 and 
IM0.NT.87 for further analysis. 

7"b11.NT.29: a putative surface protein 

The 796 nucleotide-long Tbll.NT.29 transcript encodes a 
62 aa protein, which is highly conserved in kinetoplastids 
(84% identity between T. brucei and L. major) and has a 
predicted trans-membrane domain (Figure 4A). Initial im- 
munofluorescence suggested that IMLNT.29 might be 
localized on the surface (Figure 2). To address this possi- 
bility, we compared the signal for cells expressing epitope- 
tagged IM1.NT.29 that were either permeabilized by 
detergent prior to antibody exposure or remained non- 
permeabilized. Under permeabilized conditions we de- 
tected both IM1.NT.29 and the endoplasmic reticulum 
(ER) protein BiP (Figure 4B). However, when we omitted 
the permeabilization step, thus limiting access of anti- 
bodies to potential surface molecules, no signal was 



detected for BiP, whereas the signal for TM1.NT.29 
was still visible (Figure 4B). This behavior was similar 
to procyclin (Figure 4C), a well-characterized T. brucei 
surface protein specific for the procyclic life-cycle stage 
[45]. Since the epitope tag was at the C-terminus of the 
rZ?ll. NT. 29, this result also suggested that this portion 
of the protein was exposed on the surface. To corrobor- 
ate this localization, we performed cell fractionation 
experiments and by Western blot analysis TM1.NT.29 
was enriched in the membrane fraction, similar to pro- 
cyclin, whereas HSP70 was, as expected, enriched in 
the cytoplasmic fraction (Figure 4D). 

To begin to probe the potential role of TM1.NT.29, 
we monitored the effect of RNAi knockdown on cell 
cycle progression and cell morphology. In procyclic cells 
RNAi resulted in a slowdown in growth after two days 
followed by cell death between day five and six post- 
induction (Figure 5A), whereas the RNAi effect was 
more pronounced in bloodstream form cells with cell 
death occurring between day one and two (Figure 3A). 
For cell cycle analysis in procyclics, parasites were 
stained with Hoechst at various time points after induc- 
tion and the number and position of nuclei and kineto- 
plasts (mitochondrial kDNA) in each cell were recorded. 
Cells with one kinetoplast and one nucleus (1K1N) are 
in Gl of the cell cycle, cells with two kinetoplasts and 
one nucleus (2K1N) have segregated the kinetoplast and 




0 1 2 3 4 5 01234567 
Days of RNAi Days of RNAi 




Days of RNAi 

Figure 3 Bloodstream-form growth following RNAi. (A) Tbl 1.NT.29 RNAi in bloodstream-form cells (BS). (B) TblO.NT.87 RNAi in BS cells. 
(C) Tb11.NT.28 RNAi in BS cells. Growth of un-induced (-tet) and induced cells (+ tet) shown in log scale. The data are based on three 
independent experiments, and average values with standard deviations are presented. 
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Figure 4 Characterization of 7611.NT.29. (A) Sequence conservation of Jb\ 1.NT.29 across kinetoplastids. Amino acid sequences from T. brucei 
(Tbr), T. cruzi (Tcr) and L major (Lm) were aligned with ClustalW and conserved residues are shaded with black boxes while similar residues are 
shaded in grey. The predicted transmembrane domain (TM helix) is indicated. (B) Immunofluorescence analysis on cells expressing a C-terminal 
HA-tagged version of 7b 1 1.NT.29. Permeabilized cells (left three panels) and non-permeabilized cells (right three panels) were probed for HA 
(green) and BiP (red) and DNA was stained with Hoechst (blue). (C) Immunofluorescence analysis of procyclin as described in panel (B). 
(D) Cell fractionation of parasites expressing a GFP-tagged version of Jb\ 1.NT.29. Western blot analysis was performed against GFP-tagged 761 1. 
NT.29 (top panel), procyclin (second from top), BiP (third panel from top) and HSP 70 (bottom panel) on total (T), cytoplasmic (C), membrane (M), 
nuclear (N) and cytoskeleton (CSK) fractions. 



are at the end of S phase, and cells with two kinetoplasts 
and two nuclei (2K2N) have completed mitosis and are 
poised for cytokinesis [46]. Any other arrangement is aber- 
rant and might point to defects in cell cycle progression. 
In wild-type cells, as expected for an asynchronously 
growing cell population, the majority of parasites will be 
1K1N with about 10% of cells having either a 2K1N or 
2K2N configuration. 

RNAi-induced down-regulation of Tbll.NT.29 in pro- 
cyclics resulted in the accumulation of cells containing 
either 1K2N or IKON, the latter referred to as zoids 
(Figure 5B). Zoids and 1K2N cells increased nearly 
equally in number and after three days of induction they 
comprised 7.9% and 6.9% of the cell population, respect- 
ively. Several morphological changes were observed after 



the knockdown, including flagellar detachment both as 
specific areas of separation between the cell membrane 
and the flagellum, and complete separation with only 
one visible contact point between the cell body and fla- 
gellum (Figures 5C and D). We also noted a change in 
the shape of the cell body that appeared to be specific to 
2K1N or 2K2N cells. After kinetoplast replication and 
separation, a narrowing of the cell body was evident be- 
tween the two daughter kinetoplasts (Figure 5D), which 
might indicate a defect in cytokinesis. 

7611.NT.28: a mitochondrial inner membrane protein 

The 56-amino acid IM1.NT.28 protein is 45% iden- 
tical between T. brucei and L. major and contains a 
predicted trans-membrane domain (Figure 6A). Initial 
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immunofluorescence staining revealed a similar localization 
pattern between 7M1.NT.28 and the fluorescent dye Mito- 
Tracker Red, a cell-permeable mitochondrion selective 
probe (Figure 6B). Cell fractionation experiments corrobo- 
rated the potential mitochondrial localization in that 7M1. 
NT.28 was enriched in the mitochondrial fraction similar 
to RNA-editing associated protein (REAP), a known mito- 
chondrial marker [47], whereas the cytoplasmic HSP70 was 
excluded from this fraction (Figure 6C). To determine in 
which mitochondrial compartment IM1.NT.28 might be 
localized, we exposed whole cells to increasing concentra- 
tions of digitonin, a detergent that preferentially solubilizes 
the plasma membrane and the outer membrane of the 
mitochondria. As the digitonin concentration is increased, 
specific mitochondrial compartments have been shown to 
be solubilized with 0.015% digitonin releasing proteins 
from the inter-membrane space, 0.025% digitonin solubil- 
izing matrix proteins, 0.04% digitonin resulting in release 
of outer membrane proteins and 0.1% digitonin solubiliz- 
ing inner membrane proteins [48,49]. As IMl.NT .28 was 
only released upon exposure to 0.1% digitonin, its likely 
localization is the inner membrane (Figure 6D), since 
solubilization of trypanosome alternative oxidase (TAO), a 



known inner membrane protein of T brucei mitochondria 
[50], occurred with the same digitonin concentration, 
whereas a portion of mitochondrial HSP70, a matrix pro- 
tein, was released with as little as 0.015% digitonin. 

RNAi-induced knockdown of Tbl 1. NT.28 resulted in 
cell death in procyclic forms (Figure 7A), but did not 
affect growth in bloodstream forms (Figure 3C). An ana- 
lysis of cell cycle progression in the procyclic cells fol- 
lowing RNAi did not result in an accumulation of cells 
containing aberrant DNA amounts. However, a steady 
increase in the number of 2K1N cells was observed with 
this cell type constituting one third of all cells four days 
post-induction (Figure 7B). In this category, 50% of cells 
had duplicated the kinetoplast, but the daughter kine- 
toplast remained linked in a dumbbell-shaped body, a 
larger number than seen in wild- type cells [46]. This in- 
dicated that although cells entered S phase it had not 
been completed. 

7610.NT.87: a mitochondrial matrix protein 

TM0.NT.87, a very basic protein of 64 aa, is highly 
conserved in kinetoplastids except for the first 10 aa 
(Figure 8A). Performing fluorescence microscopy on 
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live cells expressing the GFP-tagged RNAi-resistant 
construct revealed similar localization patterns between 
IM0.NT.87 and MitoTracker, indicative of a mitochon- 
drial localization (Figure 8B). Cell fractionation experi- 
ments were consistent with the immunofluorescence 
data (Figure 8D) and digitonin solubilization assays, 
as described above, suggested that TblO.NT.87 was a 
matrix protein, since the solubilization properties were 
similar to mitochondrial HSP70, an established marker 
for the matrix (Figure 8E). To further pinpoint its cellu- 
lar localization, we used immunogold electron micros- 
copy of cells expressing HA-TEV-FLAG tagged TblO. 
NT.87. Micrographs of thin sections showed that the 
protein was in the mitochondrial matrix and its distri- 
bution appeared to be uniform (Figure 8C). 

TblO.NT.87 was shown to be essential in both procyc- 
lic and bloodstream forms (Figures 9A and 3B) and a 
significant increase in cells with more than two nuclei 
and grossly enlarged cell bodies was noted after knock- 
down of TblO.NT.87 in procyclics (Figures 9B and C). 
After four days of induction, 12.5% of cells had between 
three and eight nuclei. Most of these cells contained a 
single kinetoplast, indicating that although mitosis was 
occurring there was not a corresponding replication and 



division of mitochondrial DNA. The enlarged cell body 
remained as one unit with a single flagellum as expected 
in a cell with one kinetoplast. A substantial number of 
zoids (13.7%) and 1K2N cells (16%) also accumulated by 
this time point. In accordance with the presence of a 
single kinetoplast in many cells containing multiple nu- 
clei, a potential defect in kDNA replication was further 
manifested by the appearance of cells with small kDNA 
or no detectable kDNA (Figures 9D and E). For example, 
after three days of TblO.NT.87 RNAi, 32% of cells had 
normal-sized kDNA, whereas 65% of the cells were 
scored as having a small kDNA. 

Discussion 

Although genome sequencing projects have provided a 
wealth of information about genome structure and 
organization, they also encountered a challenge to cata- 
logue all protein-coding genes. Since gene annotation pro- 
grams do not perform well in predicting small proteins, 
that is, less than 100 aa, it has been common practice to 
set an artificial length cutoff of 100 aa to avoid flawed pre- 
dictions. However, recent computational and functional 
studies highlighted the existence and importance of small 
proteins in numerous organisms [16-20,51]. Our approach 
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to gauge the extent of the small proteome in T. brucei in- 
tegrated experimental data (transcriptome and MS data) 
and evolutionary conservation. Starting with 1,117 tran- 
scribed sequences not mapping to an annotated CDS [28], 
987 have the potential to code for one or more proteins of 
at least 25 amino acids. This data set was then examined 
for conserved proteins in related kinetoplastids and repre- 
sentative eukaryotes. A similar strategy was applied in 
yeast, where more than 60% of the 299 small proteins 
identified had significant similarities with annotated pro- 
teins in other eukaryotes, including humans [19]. Likewise, 
an analysis of the M. musculus small proteome revealed 
that two-thirds of the potential sORFs identified were con- 
served in rat and 50% were conserved in human [17]. In 
contrast, only 1.3% (13) of the predicted ORFs in T. brucei 
had potential homologues in representative eukaryotes, 
with half of them coding for ribosomal proteins. One pos- 
sible explanation for this stark difference in evolutionary 
conservation might be the early divergence of T. brucei 
from other eukaryotes. We observed a comparable lack of 
conservation when searching for potential homologues in 
related kinetoplastid species: 16.3% (161) of the 987 tran- 
scripts have predicted ORFs with a significant homology 
to annotated proteins or a six-frame translation of the 
available genomes. This result was somewhat unexpected, 
since more than half of the small proteins identified in 
Arabidopsis had homologues in four closely related plants 
[16] and more than 3,000 sORFs were found to be con- 
served between D. melanogaster and a closely related spe- 
cies, D. pseudoobscura [15]. On the other hand, since it 
has been argued that conservation in closely related spe- 
cies is evidence for translation, the 173 small proteins 
identified by evolutionary conservation might represent a 



substantial proportion of the small proteome. The cur- 
rently available MS data support this view, since the pep- 
tide hits were largely confined to the set of 173 proteins. 
The only exception was the identification of five T. brucei- 
specific proteins. Thus, to fully expose the catalogue of 
small proteins, future approaches will have to concentrate 
on the identification of T. brucei-specific ORFs through 
MS analysis or ribosome profiling [52,53] . 

Examining the functional importance of small proteins 
in yeast, revealed that 22 (15.5%) of 140 sORF knockout 
cell lines had an essential phenotype in specific growth 
conditions [19] while in Arabidopsis, overexpression of 
10% of a handpicked set of almost 500 small proteins 
resulted in an abnormal phenotype [20]. Our results 
showed that 16.7% of the ORFs tested were essential in 
procyclic cells, while an additional 12% altered normal 
growth patterns. For example, Tbll.NT.28 was essential 
in procyclics but not in bloodstream-form parasites. We 
further provided evidence that this small protein of 56 
amino acids is likely localized to the inner membrane of 
the mitochondria. In T. brucei, both the size and activity 
of the mitochondria vary dramatically between life cycle 
stages. In the bloodstream form, mitochondrial respira- 
tory activity is repressed and a single tubule of the or- 
ganelle is maintained. On the other hand, in procyclic 
cells the mitochondrion forms an extensive, branching 
network and has active respiration. Although further ex- 
periments will be required to elucidate the function of 
7M1.NT.28, examination of cell cycle progression fol- 
lowing RNAi induction revealed a decreased ability of 
the cells to divide replicated kinetoplasts. In contrast, 
7M0.NT.87, a mitochondrial matrix protein of 64 amino 
acids, is essential in both procyclic- and bloodstream- 
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form parasites. Ablation of this protein in procyclics re- 
sulted in the accumulation of cells containing multiple nu- 
clei and a single kinetoplast, as well as cells with small or 
no kinetoplast, suggesting that kinetoplast replication is im- 
paired. A similar scenario was observed in bloodstream- 
form cells upon RNAi of 7M0.NT.87 (data not shown). 
The basic nature of 7M0.NT.87 (pi of 11.5) is intriguing 
and will need further investigation. Nevertheless, our 
current data are consistent with IM0.NT.87 playing a role 
in kinetoplast replication, which is an essential process in 
both life cycle stages [54]. Finally, IM1.NT.29, indispens- 
able in both procyclic and bloodstream stages, is most 
likely localized on the cell surface. Two days post-induction 
cells appeared with an asymmetrical hourglass shape and a 
kinetoplast sequestered in the smaller half of the cell, while 
the remaining portion contains 1K1N or 1K2N. The pro- 
portional accumulation of zoids and 1K2N cells that began 
after these cells arose suggested that an aberrant cytokin- 
esis of these asymmetrical hourglass cells occurred. 

Conclusions 

Our study provides evidence for the existence and import- 
ance of small proteins in the human pathogen T. brucei. 



At the same time, it is somewhat puzzling that an unex- 
pectedly low number of transcripts not matching anno- 
tated proteins or having conservation with closely related 
species were identified as containing functional ORFs. 
Even though there may be small proteins expressed at low 
levels not yet detectable by MS and others that might be 
expressed at specific stages of the life cycle, it is tempting 
to speculate that the T. brucei transcriptome includes a 
substantial number of non-coding RNAs. As all the small 
proteins identified as essential are unique to kinetoplas- 
tids, they may become new targets to block the survival of 
trypanosomes in the insect vector and/or the mammalian 
host. The next important question to be tackled is the 
mechanism of action of these small proteins. 

Methods 

Standard methods 

Western blots [55], transfection of procyclic cells [56] and 
RNA isolation [56] were performed using previously pub- 
lished protocols. Oligonucleotides used to prepare clones 
and probes for Northern blots are listed in Additional 
file 2: Figure S9. 
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Figure 9 Analysis of growth and cell cycle progression following RNAi of T610.NT.87. (A) Tb10.NT.87 RNAi in procyclics (PC). Growth for 
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Bioinformatics 

The 987 transcripts were translated using the getorf pro- 
gram of the European Molecular Biology Open Software 
Suite setting a lower limit of 25 aa and including only 
ORFs that contained a start and stop codon [57]. 

We used the NCBI BLAST suite (BLAST 2.2.28, [35]) 
for our protein searches. The predicted 4,699 T. brucei 
ORFs were used as queries for blastp to search all non- 
redundant kinetoplastid protein sequences (taxid: 5653), 
using an e-value cutoff of 0.1. T. brucei (taxid: 5691). 
T. b. gambiense (taxid: 31285) and T. b. brucei (taxid: 5702) 
sequences were excluded from the search. Similarly, 
tblastn was used to search the kinetoplastid translated 
nucleotide database with an e-value cutoff of 0.1. The 
annotated proteins of S. cerevisiae (taxid: 4932), C. ele- 
gans (taxid: 6239), A. thaliana (taxid: 3702), D. melano- 
gaster (taxid: 7227), M. musculus (taxid: 10090) and H. 
sapiens (taxid: 9606) were queried with the same strat- 
egy. All alignments were manually inspected and verified 
to exclude false positives due to the relaxed threshold. 



The predicted T. brucei proteins were scanned for do- 
mains using the NCBI CD-Search Tool (edsearch/edd 
v3.10; [36]). Transmembrane helices in proteins were 
predicted using the TMHMM Server v. 2.0 [38,58] and 
the presence and location of signal peptide cleavage sites 
were scanned at the SignalP 4.1 Server [37,59]; both 
servers are at the Technical University of Denmark. 

Mass spectrometry analysis 

Procyclic-form trypanosomes (MiTat 1.4) were lysed in 
50 mM Tris (pH 7.3) with 4% SDS. To decrease sample 
complexity, the samples were filtered directly using differ- 
ent (3 kDa, 10 kDa, 30 kDa and 50 kDa molecular weight 
cut off (MWCO)) Amicon Ultracel centrifugal filter units 
(Millipore, Billerica, MA, USA) and under strong de- 
naturing conditions with 8 M urea. The filtrate was pre- 
cipitated using 4 volumes of ethanol and subsequently 
resuspended in 20 ul 8 M urea. The samples were reduced 
with 1 mM dithiothreitol (DTT) and alkylated using 
5 mM iodoacetamide, prior to digestion with 0.2 ug Lys-C 
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(Wako, Richmond, VA, USA) for three hours followed by 
digestion with 0.2 ug trypsin (Promega, Madison, WI, 
USA) overnight. The peptides were desalted using a 
StageTip [60]. For MS analysis, peptides were separated by 
a nanoflow liquid chromatography EASY-nLC system on a 
capillary packed with Reprosil-C18 (Dr. Maisch) with an 
acetonitrile gradient from 2% to 60% at a flow rate of 
250 nl/minute for 230 minutes. The Orbitrap XL mass 
spectrometer was operated in a data-dependent acquisi- 
tion mode performing Top 10 MS/MS per full cycle. Data 
analysis was done with MaxQuant version 1.2.0.18 [26] 
using a concatenated database of TREU 927 v.2.3 (10,533 
entries, tritrypDB.org) and the hits generated by the RNA- 
Seq analysis (987 entries, Additional file 1). Enzyme search 
specificity was set for tryptic peptides. Up to two misclea- 
vages for each peptide were allowed. Carbamidomethylation 
on cysteines was set as fixed modification, while methionine 
oxidation and protein N-acetylation were considered as 
variable modifications. The search was performed with an 
initial mass tolerance of 6 ppm mass accuracy for the pre- 
cursor ion and 0.5 Da. False discovery rate was fixed at one 
percent on peptide and protein level. For the second data 
set [22], we reanalyzed previously generated MS data using 
MaxQuant standard settings with the above described 
concatenated database. The MS proteomics data have been 
deposited to the ProteomeXchange Consortium [61] via the 
PRIDE partner repository [62] with the dataset identifier 
PXD000711 [22] and PXD000712 (this study). 

Cell culture 

T. brucei 29.13.6 Lister 427 procyclic cells [41] were cul- 
tured at 28°C with 5% C0 2 in Cunninghams media sup- 
plemented with 10% Tet-system approved heat inactivated 
fetal bovine serum (FBS, Clontech, Mountain View, CA, 
USA), 2 mM L-glutamine, 100 units/ml penicillin, 100 (ig/ml 
streptomycin, 50 (ig/ml gentamicin, 15 ug/mL G418 
and 50 ug/ml hygromycin B. A total of 1 x 10 7 29.13.6 cells 
were used for each procyclic-form transfection. Cells were 
spun down, washed in Cytomix (20 mM KC1, 0.15 mM 
CaCl 2 , 10 mM K 2 HP0 4 , 25 mM 4-(2-hydroxyethyl)pipera- 
zine-l-ethanesulfonic acid (Hepes), 2 mM ethylenedi- 
aminetetraacetic acid (EDTA) and 5 mM MgCl 2 , pH7.6), 
then resuspended in 500 ul Cytomix. Then, 25 ug of linear- 
ized plasmid DNA was added to the solution and cells were 
pulsed twice at 1,600 V with a time constant of 0.6 ms on a 
GenePulser Xcell (BioRad, Hercules, CA, USA). Cells were 
allowed to rescue for 24 hours before selective drug was 
added. Phleomycin and blasticidin were added to a final 
concentration of 2.5 (ig/ml and 10 ug/ml, respectively. 
Transfected cells were cloned at least 24 hours after trans- 
fection. Serial dilutions of the cells were then made in 
media with 20% serum and the presence of 3 x 10 7 29.13.6 
cells that had not been transfected. A total of 200 ul of 
transfected cells were added to 1.8 ml of the cloning media 



and further diluted in six, five-fold dilutions. Each dilution 
was plated in a 96-well plate and clones were selected from 
dilutions where fewer than 30% of wells had growth. T. 
brucei SM Lister 427 bloodstream-form cells [41] were 
maintained at 37 °C with 5% C0 2 in HMI-9 media supple- 
mented with 10% Tet-system approved heat inactivated 
FBS (Clontech), 100 units/ml penicillin, 100 (ig/ml strepto- 
mycin, 50 (ig/ml gentamicin, and 2.5 (ig/ml G418. For 
bloodstream form transfections, 3 x 10 7 cells were centri- 
fuged, washed quickly with Tb-BSF buffer [63], resuspended 
in 100 ul Tb-BSF buffer containing 10 ug of plasmid DNA 
and transfected with protocol X-01 in an AMAXA Nucleo- 
fector®. Transfected cells were placed in 30 ml pre-warmed 
HMI-9 media and two 10-fold serial dilutions were plated 
in a 24-well plate. Six hours after transfection, pre-warmed 
medium supplemented with appropriate selectable drugs 
was added. The final concentration of selectable markers 
was 2.5 (ig/ml phleomycin and 5 (ig/ml blasticidin. For in- 
duction of hairpin or RNAi-resistant construct expression, 
10 (ig/ml and 1 (ig/ml of doxycycline was added to 
procyclic- and bloodstream-form cells, respectively. 

Plasmid constructions 
Constructs encoding hairpin RNAs 

We followed the Gateways-adapted cloning scheme de- 
veloped by Margaret Phillips and colleagues [40]. Briefly, 
a 300 to 400 base pair region was PCR amplified and 
TA-cloned (deoxythymidine, T, deoxyadenosine, A) into 
plasmid pCR/8GW/TOPO (Invitrogen, Grand Island, 
NY, USA) to generate an entry clone. The entry clone 
was then recombined with the destination vector 
pTrypRNAiGate. Final constructs were verified by re- 
striction enzyme digestions and DNA sequencing. 

RNAi-resistant constructs 

To rescue the lethal RNAi phenotype, RNAi-resistant 
versions of the CDS were synthesized by GeneWiz, Inc, 
Cambridge, MA, USA. A silent mutation was introduced 
every twelfth nucleotide [44], a C- terminal HA-TEV- 
FLAG tag was added and the CDS was flanked at the 5' 
and 3' end by a Hind III and Bam HI restriction site, re- 
spectively. The RNAi-resistant construct was cloned into 
the inducible pLewl00v5 BSR plasmid flanked by the 
GPEET procyclin 5' UTR and an aldolase 3'UTR. In 
addition, inducible GFP-tagged RNAi-resistant constructs 
were generated in pLewl00v5 BSR. The plasmids contain- 
ing the RNAi-resistant constructs were digested with Not 
I to allow for recombination into a T. brucei ribosomal 
RNA spacer region following transfection. For each 
knock-down, ten clones were tested for a phenotype. 

Northern blotting 

Total RNA was separated on 1.5% agarose gels in the 
presence of 6.3% formaldehyde in 40 mM 3- 
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morpholinopropane-1 -sulfonic acid (MOPS) and 2 mM 
EDTA. RNA ladder 0.5 to 10 kb (Invitrogen) was used as 
marker. The RNA was transferred overnight to a Hybond-N 
nylon membrane (GE Healthcare, Little Chalfont, United 
Kingdom) by capillary transfer with lOx SSC (0.15 M so- 
dium citrate, 1.5 M sodium chloride), UV cross-linked to 
the membrane and stained with methylene blue. The mem- 
brane was pre-hybridized for one hour in 5x SET (0.75 M 
sodium chloride, 5 mm EDTA, 0.15 M Tris-HCl, pH 7.4), 
lOx Denhardts solution, 1% SDS and 100 ug/ml yeast 
RNA, and then hybridized overnight in the same solution. 
DNA probes were internally labeled by synthesis with 
specific dsDNA templates, sense and antisense primers and 
Pfx DNA polymerase (Invitrogen) in the presence of 
[a- 32 P]dCTP. The membrane was washed two to three 
times (10 minutes each) with 2x SSC, 0.1% SDS and 
hybridization signals were detected by Phosphorlmager. 

Semi-quantitative reverse transcriptase (RT)-PCR 

For each sample, 5 ug of total RNA was treated with 2 units 
of DNase RQ (Promega), phenol extracted and ethanol pre- 
cipitated. DNase-treated RNA was reverse transcribed 
using random primers (Promega) and Superscript II en- 
zyme (Invitrogen) according to the manufacturer s protocol. 
Twenty-two cycles of PCR were then performed using 
Platinum Pfx (Invitrogen), according to the manufacturers 
instructions, for each knockdown with histone 4 used as a 
control. An annealing temperature of 50°C was used for all 
oligonucleotides with an extension time of 30 seconds. 

Immunofluorescence 

A total of 5 to 8 x 10 6 cells were spun down and washed 
twice with cell wash (20 mM Tris-HCl (pH 7.5), 
100 mM NaCl and 3 mM MgCl 2 ) before being placed 
on slides coated with poly-L-lysine and settling for five 
minutes. Then, 4% paraformaldehyde (PFA) was added 
and the cells were fixed for 30 minutes at 4°C. Cells were 
washed twice and then exposed to 0.1% NP-40 detergent 
in a solution of 2% goat serum. Slides were washed again 
and blocked with 10% goat serum for 10 minutes. Upon 
removal of blocking solution, primary antibody, diluted 
in 2% goat serum, was administered for one hour. Anti- 
GFP (Roche, Basel, Switzerland), anti-HA (Covance, 
Princeton, NJ, USA), and a-GPEET procyclin (Cedarlane 
labs, Burlington, Ontario, Canada) antibodies were ob- 
tained commercially. The antibodies to BiP were gener- 
ously provided by Jay Bangs. Cells were washed five 
times, before the addition of secondary antibody and 
5 ug/ml Hoechst (Cell Signaling Technology, Inc, 
Beverly, MA, USA). Cells were exposed to secondary 
antibody, diluted in 2% goat serum, for one hour. All 
secondary antibodies (Alexa Fluor 488-conjugated goat 
anti-mouse, 594-conjugated goat anti-mouse, 488- 
conjugated goat anti-rabbit and 594-conjugated goat anti- 



rabbit (Invitrogen)) were used at a 1:1,000 dilution. 
Samples were washed five times for 10 minutes total. 
Wash solution was removed and FluorSave (Calbiochem, 
Darmstadt, Germany) was added. Next, the coverslip was 
placed on the slide, and FluorSave reagent dried for two 
hours to overnight before cells were imaged on a Zeiss 
(Jena, Germany) Axioplan 2 fluorescence microscope. For 
kDNA size estimation, cells were fixed and stained with 
Hoechst, following the protocol outlined above. Cells were 
imaged and kDNA size was compared visually between 
un-induced cells and induced cells. 

MitoTracker 

A total of 1 x 10 7 cells was spun down, washed once in 
cell wash (20 mM Tris-HCl (pH 7.5), 100 mM NaCl 
and 3 mM MgCl 2 ) and resuspended in Cunninghams 
media with no added serum. MitoTracker Red CM- 
H2xRos (Invitrogen) was added to a final concentration 
of 1 uM. Cells were incubated at 28°C and 5% C0 2 for 
10 to 15 minutes, centrifuged, washed and placed in 
fresh media without serum. Cells were rescued in media 
without MitoTracker for 25 minutes, then Hoechst 
(5 ug/ml) was added; cells were spun down and finally 
resuspended in 20 to 50 ul of PBSG. 

Live cell imaging 

A total of 5 to 8 x 10 6 cells was collected, Hoechst 
(5 ug/ml) was added and the cells were incubated in the 
dark for two minutes. Cells were centrifuged, washed 
once in phosphate-buffered saline with glucose (PBSG), 
resuspended in 20 to 50 ul of PBSG, and imaged on a 
Zeiss Axioplan 2 fluorescence microscope. 

Cell fractionations 

Cell compartment qproteome kit (Qiagen) 

Each fractionation used 1 x 10 9 cells expressing GFP- 
tagged Tbll.NT.29 and the manufacturers (Qiagen, 
Venlo, Limburg, The Netherlands) instructions were 
followed. The anti-HSP70 and anti-BiP antibodies were 
generously provided by Jay Bangs. 

Mitochondrial isolation qproteome kit (Qiagen) 

Each fractionation used 1 x 10 9 cells expressing GFP- 
tagged TblO.NT.87 or Tbll.NT.28 and the manufacturers 
(Qiagen) instructions were followed. The antibodies to 
REAP and I&MP63 were generously provided by Steve 
Hajduk and Ken Stuart, respectively. 

Digitonin solubilization assay 

Cells (1x10 s ) expressing GFP-tagged TblO.NT.87 or 
HA- tagged Tbll.NT.28 were spun down for each assay, 
washed twice with 20 mM sodium phosphate (pH 7.9), 
20 mM glucose and 0.15 M NaCl, and then resuspended 
in 500 ul SoTE buffer (20 mM Tris-HCl (pH 7.5), 0.6 M 
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sorbitol, 2 mM EDTA). Next, 500 \i\ of SoTE buffer with 
varying digitonin amounts was added to each sample to 
a final concentration of detergent of 0.015%, 0.025%, 
0.04%, 0.05% or 0.1% [48,49]. Samples were incubated 
for five minutes at 4°C followed by centrifugation for 
three minutes at 5,000 g at 4°C. SDS -sample buffer was 
added to the supernatants and samples were analyzed by 
SDS-PAGE and Western blotting. The antibodies to 
TAO and mtHSP70 were generously provided by Minu 
Chaudhuri and Jay Bangs, respectively. 

Electron microscopy 

Samples were fixed in 4% PFA/0.1% gluteraldehyde in PBS 
for 30 minutes followed by further fixation in 4% PFA for 
one hour, rinsed in PBS, scraped and re-suspended in 10% 
gelatin. Chilled blocks were trimmed, placed in 2.3 M su- 
crose overnight on a rotor at 4°C, transferred to aluminum 
pins and frozen rapidly in liquid nitrogen. The frozen 
blocks were cut on a Leica Cryo-EMUC6 UltraCut and 
65 nm thick sections were collected using the Tokoyasu 
method [64] and placed on carbon/formvar coated grids 
and floated in a dish of PBS for immunolabeling. Grids 
were placed section side down on drops of 0.1 M ammo- 
nium chloride to quench untreated aldehyde groups, then 
blocked for nonspecific binding on 1% fish skin gelatin in 
PBS. Single labeled grids were incubated on a primary anti- 
body mouse anti-HA (Covance) 1:50 dilution, which re- 
quired a rabbit anti-mouse bridge (Jacksonlmmuno, West 
Grove, PA, USA). The secondary antibody was 10 nm Pro- 
tein A gold (Utrecht Medical Center). All grids were rinsed 
in PBS, fixed using 1% gluteraldehyde for five minutes, 
rinsed again and transferred to a UA/methylcellulose drop 
before being collected and dried. Samples were viewed 
using a FEI Tencai Biotwin TEM at 80 Kv. Images were 
taken using Morada CCD and iTEM (Olympus) software. 

Additional files 



essential small protein from procyclic cells un-induced (-) or induced (+) for 
RNAi. Histone 4 (H4) was used as a control. The small protein transcript and 
H4 are indicated to the right of each panel. Figure S6. Listing of the seven 
essential proteins in procyclics with a C-terminal GFP or HA tag. All 
constructs rescued the lethal RNAi phenotype. Figure S7. Western blot 
analysis of GFP-tagged proteins encoded by an RNAi-resistant construct. 
Size markers (kDa) are shown at the left. Figure S8. Growth analysis of 
bloodstream-form cells following RNAi against four small proteins. Growth 
for un-induced (tet-) and induced (tet+) cells are shown in log scale. The 
data are based on three independent experiments. Figure S9. Listing of 
oligonucleotides used in this study. 

Additional file 3: Listing of the 178 transcripts (sheet 178 ORFs) 
and 42 transcripts (sheet 42 ORFs) encoding potential ORFs with 
matching peptides and hits in Kinetoplastids. Panigrahi et al: A 
comprehensive analysis of Trypanosoma brucei mitochondrial proteome. 
Proteomics 2009, 9:434-450 [21]. Butter et al:. Comparative proteomics of 
two life cycle stages of stable isotope-labeled Trypanosoma brucei reveals 
novel components of the parasite's host adaptation machinery. Mol Cell 
Proteomics 2013, 12:172-179 [22]. viv, T. vivax; cru, T. cruzi; Leish, 
Leishmania; con, T. congolense; evansi, T. evansi. PTM Abbreviations: P, 
phosphorylation; S, sumoylation; PM, palmitoylation; G, glycosylation. 
Phosphorylation prediction: NetPhos http://www.cbs.dtu.dk/services/ 
NetPhos/; Sumoylation prediction: http://sumosp.biocuckoo.org/; 
Palmitoylation Prediction: http://csspalm.biocuckoo.org/; Glycosylation 
Prediction: http://www.cbs.dtu.dk/services/YinOYang/. 

Additional file 4: T. brucei predicted ORFs conserved in representative 
eukaryotes. BLAST bit scores of T. brucei predicted ORFs with potential 
homologs in representative eukaryotes. Bit scores are shown after the 
accession number and they represent a normalized version of the raw 
BLAST alignment score. 

Additional file 5: Survey of conserved domains in 1 73 T. brucei 
predicted ORFs. 
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in Figure 1. Setting a lower limit of 25 aa, all the potential ORFs are 
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according to the nomenclature by Kolev et al. [28] and _1 indicates ORF 
#1. The numbers in parenthesis specify the CDS. 
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following induction of RNAi with and without expression of an 
RNAi-resistant construct. The data are based on three independent 
experiments. Figure S4. Northern blot analysis of essential ORFs. Northern 
blots for six {Tbl 1.NT.108, TblO.NT.90, TblO.NT.86, TblO.NT.87, Tbl 1.NT.28 and 
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